Skip to content

[CI][Bench] Create summary reports for benchmarking CI run results #19733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: sycl
Choose a base branch
from

Conversation

ianayl
Copy link
Contributor

@ianayl ianayl commented Aug 6, 2025

Much easier to figure out what's wrong with the benchmarking CI runs when it tells you what's wrong immediately: https://github.com/intel/llvm/actions/runs/16789472825

@ianayl
Copy link
Contributor Author

ianayl commented Aug 11, 2025

Test runs, feel free to get an idea of what the summaries look like here:

@ianayl ianayl marked this pull request as ready for review August 11, 2025 18:56
@ianayl ianayl requested review from a team as code owners August 11, 2025 18:56
Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaml lgtm

@ianayl
Copy link
Contributor Author

ianayl commented Aug 13, 2025

@intel/llvm-reviewers-benchmarking Friendly ping

if args.produce_github_summary:
gh_summary.println("### Regressions")
gh_summary.println(
f"<details><summary>{len(regressions_ignored)} non CI-failing regressions:</summary>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you clarify what you meant by non CI-failing regressions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks for taking a look Udit,
The benchmark CI now also runs UR benchmarks, L0 benchmarks, etc.; regressions in e.g. L0 should not cause the nightly benchmarking CI for SYCL to fail, thus they are filtered out and categorized differently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case, should we rename it to "non-SYCL regressions"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't list "regressions" in the summary where delta is less than the noise threshold, correct?

Copy link
Contributor Author

@ianayl ianayl Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. In that case, should we rename it to "non-SYCL regressions"?

I feel like that would be less confusing, but I am also aware that other projects use these series of benchmarking scripts as well (i.e. UMF), so I was hesitant to hardcode "non-SYCL" into the titles/descriptions. In hindsight this should perhaps be a customizable option.

Also, we don't list "regressions" in the summary where delta is less than the noise threshold, correct?

That is correct. Noise is ignored.

"a CI failure: "
)
gh_summary.println(
f"<details><summary>{len(regressions_of_concern)} CI-failing regressions:</summary>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, by CI-failing regression do you mean regressions where delta exceeds the noise threshold, thus causing the test to fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific implementation is "if the benchmark matches a filter (i.e. the benchmark is a SYCL test) and there is a regression above the noise threshold, we fail the SYCL nightly benchmarking CI"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, using SYCL and non-SYCL regression would be clearer.

@@ -46,6 +46,24 @@ class BenchmarkHistoricAverage:
# TODO Ensure ONEAPI_DEVICE_SELECTOR? GPU name itself?


class OutputFile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just thinking out loud, do we really need this class? As an alternative, we can have a string to keep Github summary and at the end, we can just append this string to the github_summary.md file?

parser_avg.add_argument(
"--produce-github-summary",
action="store_true",
help="Produce a github CI summary file.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the help msg should also include the name of the file containing Github summary, github_summary.md, since the file name is hard coded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants